11 research outputs found

    Naive possibilistic classifiers for imprecise or uncertain numerical data

    Get PDF
    International audienceIn real-world problems, input data may be pervaded with uncertainty. In this paper, we investigate the behavior of naive possibilistic classifiers, as a counterpart to naive Bayesian ones, for dealing with classification tasks in the presence of uncertainty. For this purpose, we extend possibilistic classifiers, which have been recently adapted to numerical data, in order to cope with uncertainty in data representation. Here the possibility distributions that are used are supposed to encode the family of Gaussian probabilistic distributions that are compatible with the considered dataset. We consider two types of uncertainty: (i) the uncertainty associated with the class in the training set, which is modeled by a possibility distribution over class labels, and (ii) the imprecision pervading attribute values in the testing set represented under the form of intervals for continuous data. Moreover, the approach takes into account the uncertainty about the estimation of the Gaussian distribution parameters due to the limited amount of data available. We first adapt the possibilistic classification model, previously proposed for the certain case, in order to accommodate the uncertainty about class labels. Then, we propose an algorithm based on the extension principle to deal with imprecise attribute values. The experiments reported show the interest of possibilistic classifiers for handling uncertainty in data. In particular, the probability-to-possibility transform-based classifier shows a robust behavior when dealing with imperfect data

    Possibilistic classifiers for numerical data

    Get PDF
    International audienceNaive Bayesian Classifiers, which rely on independence hypotheses, together with a normality assumption to estimate densities for numerical data, are known for their simplicity and their effectiveness. However, estimating densities, even under the normality assumption, may be problematic in case of poor data. In such a situation, possibility distributions may provide a more faithful representation of these data. Naive Possibilistic Classifiers (NPC), based on possibility theory, have been recently proposed as a counterpart of Bayesian classifiers to deal with classification tasks. There are only few works that treat possibilistic classification and most of existing NPC deal only with categorical attributes. This work focuses on the estimation of possibility distributions for continuous data. In this paper we investigate two kinds of possibilistic classifiers. The first one is derived from classical or flexible Bayesian classifiers by applying a probability–possibility transformation to Gaussian distributions, which introduces some further tolerance in the description of classes. The second one is based on a direct interpretation of data in possibilistic formats that exploit an idea of proximity between data values in different ways, which provides a less constrained representation of them. We show that possibilistic classifiers have a better capability to detect new instances for which the classification is ambiguous than Bayesian classifiers, where probabilities may be poorly estimated and illusorily precise. Moreover, we propose, in this case, an hybrid possibilistic classification approach based on a nearest-neighbour heuristics to improve the accuracy of the proposed possibilistic classifiers when the available information is insufficient to choose between classes. Possibilistic classifiers are compared with classical or flexible Bayesian classifiers on a collection of benchmarks databases. The experiments reported show the interest of possibilistic classifiers. In particular, flexible possibilistic classifiers perform well for data agreeing with the normality assumption, while proximity-based possibilistic classifiers outperform others in the other cases. The hybrid possibilistic classification exhibits a good ability for improving accuracy

    Classifieurs Possibilistes des données Numériques Certaines et Incertaines

    No full text
    This thesis enters within the framework of Machine learning and concerns the study of a variety of classification methods for numerical data.The first issue in this work concerns the study of Rule based classification. In fact, rule induction algorithms suffer from two main drawbacks when classifying test examples: i) the multiple classification problems when many rules cover an example and are associated with different classes, and ii) the choice of a default class, which concerns the non-covering case. Our first contribution is to propose a family of Possibilistic Rule-based Classifiers (PRCs) to deal with such problems which are an extension and a modification of the PART algorithm. The PRCs keep the same rule learning step as PART, but differ in other respects. In particular, the PRCs learn fuzzy rules instead of crisp rules, consider weighted rules at deduction time in an unordered manner instead of rule lists and reduce the number of examples not covered by any rule using a fuzzy rule set with large supports. The experiments reported show that the PRCs lead to improve the accuracy of the classical PART algorithm.On the other hand Naive Bayesian Classifiers (NBC), which relies on independence hypotheses, together with a normality assumption to estimate densities for numerical data, are known for their simplicity and their effectiveness. However estimating densities, even under the normality assumption, may be problematic in case of poor data. In such a situation, possibility distributions may provide a more faithful representation of these data.A second contribution in this thesis focuses on the estimation of possibility distributions for continuous data. For this purpose we investigate two families of possibilistic classifiers. The first one is derived from classical or flexible Bayesian classifiers by applying a probability-possibility transformation to Gaussian distributions which introduces some further tolerance in the description of classes and gives place to the Naive Possibilistic Classifier (NPC) and the Flexible Naive Possibilistic Classifier (FNPC). In the same context, we also use a probability-possibility transformation method enabling us to derive a possibilistic distribution as a family of Gaussian distributions. We have proposed two other possibilistic classifiers; the NPC-2 and FNPC-2 which takes into account the confidence intervals of the Gaussian distributions. The second family of possibilistic classifiers abandons the normality assumption and has a direct representation of data. We propose two other classifiers named Fuzzy Histogram Classifier (FuHC) and Nearest Neighbor-based Possibilistic Classifier (NNPC) in this context. The two proposed classifiers exploit an idea of proximity between attribute values in order to estimate possibility distributions. The last issue in this thesis concerns the classification of data with continuous input variables in presence of uncertainty. We extend possibilistic classifiers that we have previously proposed for numerical data, in order to cope with uncertainty in data representation. We consider two types of uncertainty: i) the uncertainty associated with the class in the training set, which is modelled by a possibility distribution over class labels, and ii) the imprecision pervading attribute values in the testing set represented under the form of intervals for continuous data. We first adapt the possibilistic classification model, previously proposed for the certain case, in order to accommodate the uncertainty about class labels. Then, we propose an extension principle-based algorithm to deal with imprecise attribute values.Possibilistic classifiers are compared to classical or flexible Bayesian classifiers on a collection of benchmarks databases. The experiments reported show the efficiency of possibilistic classifiers to deal with certain or uncertain data. In particular, the probability-to-possibility transform-based classifiers show a robust behaviour when dealing with imperfect data.Le sujet de cette thèse entre dans le cadre d’apprentissage automatique et concerne l’étude d’une variété de méthodes de classification pour les données numériques.Notre premier intérêt dans cette thèse concerne l'étude de la classification à base de règles. En fait, les algorithmes d'induction de règles souffrent de deux inconvénients majeurs lors de la classification des exemples de tests: (i) le problème de classification multiple qui se produit quand plusieurs règles couvrent l’exemple à classer mais qui ont différent conséquents (classes) et (ii) le problème de non couverture quand aucune règle ne couvre l’exemple et qui concerne le choix de la classe/règle par défaut. Notre contribution dans ce cadre consiste à proposer une famille de Classifieur à base de Règles Possibiliste PRCs pour traiter ce type de problèmes qui est une extension et une modification de l’algorithme PART. Le classifieur PRC garde le même principe d’apprentissage que celui de PART et diffère de ce dernier en plusieurs aspects. En particulier, le PRC apprend des règles floues au lieu des règles classiques, considère des règles non ordonnées assignées à des poids au moment de la déduction au lieu de listes de décision et réduit le nombre d’exemples non couverts par aucune règle en utilisant un ensemble de règles floues a large support. Les expérimentations ont montré l’efficacité des PRCs d’améliorer le taux de classification si comparé à l’algorithme PART classique. D'autres parts, les classifieurs Bayésiens Naïfs (NBC) ont été largement utilisés dans plusieurs domaines pour classer les données numériques. Ces classifieurs se basent sur l’hypothèse d’indépendance et l’hypothèse de normalité pour estimer les densités des probabilités des attributs. En fait, ces deux hypothèses sont limitantes dans le sens qu’elles peuvent être problématique dans le cas où les données sont très réduites. Dans ce cas, les distributions de possibilités peuvent offrir une meilleure représentation de ces données.Notre deuxième contribution dans cette thèse consiste à estimer les distributions de possibilités pour les données continues. Dans ce cadre, nous avons proposé et étudié deux familles des classifieurs possibilistes pour des attributs numériques. La première famille, adoptant l’hypothèse de normalité des distributions des données, est basée sur une méthode de transformation de probabilité en possibilité permettant de transformer un Classifieur Bayésien Naïf classique en un Classifieur Possibiliste Naïf (NPC). Cette transformation a l’avantage d’ajouter plus de tolérance dans la description des classes. Nous avons également analysé la faisabilité d'un Classifieur Possibiliste Naïf Flexible (FNPC) qui constitue la contre partie possibiliste du Classifieur Bayésien Naïf Flexible. Dans le même contexte, nous avons aussi utilisé une deuxième méthode de transformation de probabilité en possibilité permettant de dériver une distribution de possibilité comme une famille de distributions Gaussiennes. Nous avons proposé deux autres classifieurs possibilistes à base probabilistes ; le NPC-2 et FNPC-2 qui prennent en considération l’intervalle de confiance des distributions Gaussiennes. La deuxième famille de classifieurs possibilistes abandonne l’hypothèse de normalité et reflète une représentation directe et plus proche sur les données. Nous avons proposé deux classifieurs appelés Classifieur à base des Histogrammes Floues (FuHC) et Classifieur Possibiliste à base du plus Proche Voisin (NNPC). Le premier classifieur exploite une idée de proximité entre les valeurs d'attribut d'une façon additive tandis que le second est basé seulement sur l'analyse des proximités entre les attributs sans les compter.Une troisième contribution principale dans cette thèse est le traitement de l’incertitude lors de la classification des données numériques utilisant la théorie de possibilité. Dans ce cadre, nous avons étendu les classifieurs possibilistes proposés dans le cas certain pour être compatibles en présence d’incertitude aux niveaux des données. Nous avons considéré deux types d’incertitude : (i) l’incertitude reliée à la classe dans l’ensemble d’apprentissage modélisée à travers une distribution de possibilité sur les valeurs de la classe et (ii) l’imprécision au niveau des valeurs d’attributs dans l’ensemble de test représentée sous forme d’intervalles dans le cas continu. En premier lieu, nous avons adapté le modèle de classification possibiliste, précédemment développé pour le cas certain, pour prendre en considération l’incertitude au niveau des valeurs de la classe. En second lieu, nous avons proposé un algorithme basé sur le principe d’extension pour traiter l’imprécision au niveau des d’attributs. Les classifieurs possibilistes ont été comparés aux classifieurs Bayésiens classiques et flexibles. Les expérimentations ont montré l’efficacité des classifieurs possibilistes pour traiter les données certaines ou incertaines. En particulier, les classifieurs basés sur la transformation de probabilité en possibilité sont robustes lors de la classification des données imparfaites

    Analogical classification: A new way to deal with examples

    Get PDF
    International audienceIntroduced a few years ago, analogy-based classification methods are a noticeable addition to the set of lazy learning techniques. They provide amazing results (in terms of accuracy) on many classical datasets. They look for all triples of examples in the training set that are in analogical proportion with the item to be classified on a maximal number of attributes and for which the corresponding analogical proportion equation on the class has a solution. In this paper when classifying a new item, we demonstrate a new approach where we focus on a small part of the triples available. To restrict the scope of the search, we first look for examples that are as similar as possible to the new item to be classified. We then only consider the pairs of examples presenting the same dissimilarity as between the new item and one of its closest neighbors. Thus we implicitly build triples that are in analogical proportion on all attributes with the new item. Then the classification is made on the basis of a majority vote on the pairs leading to a solvable class equation. This new algorithm provides results as good as other analogical classifiers with a lower average complexity

    Tordos and twisters

    Get PDF
    The Ginger and Fred office building of the Nationale Nederlanden Insurance Company in Prague, Czech Republic (1996) by the American architect Frank O. Gehry. The nickname, referring to the dancing couple Ginger Rogers and Fred Astaire, was inspired by the two elegantly curved intimate volumes in the fae. The transparent one resembles a tailored ballroom gown. It is facetted with flat quadrangular sheets of glass. Fred is formed from precast concrete panels. No freely curved window panes were available at the time, so Gehry reverted to rectangular window frames that protrude at varying distances from the rendering over the concrete elements

    Possibilistic Classifiers for Uncertain Numerical Data

    No full text
    PSerr&al004International audienceIn many real-world problems, input data may be pervaded with uncertainty. Naive possibilistic classifiers have been proposed as a counterpart to Bayesian classifiers to deal with classification tasks in presence of uncertainty. Following this line here, we extend possibilistic classifiers, which have been recently adapted to numerical data, in order to cope with uncertainty in data representation. We consider two types of uncertainty: i) the uncertainty associated with the class in the training set, which is modeled by a possibility distribution over class labels, and ii) the imprecision pervading attribute values in the testing set represented under the form of intervals for continuous data. We first adapt the possibilistic classification model, previously proposed for the certain case, in order to accommodate the uncertainty about class labels. Then, we propose an extension principle-based algorithm to deal with imprecise attribute values. The experiments reported show the interest of possibilistic classifiers for handling uncertainty in data. In particular, the probability-to-possibility transform-based classifier shows a robust behavior when dealing with imperfect data

    A Possibilistic Rule-Based Classifier

    No full text
    International audienceRule induction algorithms have gained a high popularity among machine learning techniques due to the “intelligibility” of their output, when compared to other “black-box” classification methods. However, they suffer from two main drawbacks when classifying test examples: i) the multiple classification problem when many rules cover an example and are associated with different classes, and ii) the choice of a default class, which concerns the non-covering case. In this paper we propose a family of Possibilistic Rule-based Classifiers (PRCs) to deal with such problems which are an extension and a modification of the Frank and Witten’ PART algorithm. The PRCs keep the same rule learning step as PART, but differ in other respects. In particular, the PRCs learn fuzzy rules instead of crisp rules, consider weighted rules at deduction time in an unordered manner instead of rule lists. They also reduce the number of examples not covered by any rule, using a fuzzy rule set with large supports. The experiments reported show that the PRCs lead to improve the accuracy of the classical PART algorithm

    From Bayesian classifiers to possibilistic classifiers for numerical data

    No full text
    International audienceNaĂŻve Bayesian classifiers are well-known for their simplicity and efficiency. They rely on independence hypotheses, together with a normality assumption, which may be too demanding, when dealing with numerical data. Possibility distributions are more compatible with the representation of poor data. This paper investigates two kinds of possibilistic elicitation methods that will be embedded into possibilistic naĂŻve classifiers. The first one is derived from a probability-possibility transformation of Gaussian distributions (or mixtures of them), which introduces some further tolerance. The second kind is based on a direct interpretation of data in fuzzy histogram or possibilistic formats that exploit an idea of proximity between attribute values in different ways. Besides, possibilistic classifiers may be allowed to leave the classification open between several classes in case of insufficient information for choosing one (which may be of interest when the number of classes is large). The experiments reported show the interest of possibilistic classifiers

    Naive Possibilistic Classifiers for Imprecise or Uncertain Numerical Data

    No full text
    In real-world problems, input data may be pervaded with uncertainty. In this paper, we investigate the behavior of naive possibilistic classifiers, as a counterpart to naive Bayesian ones, for dealing with classification tasks in presence of uncertainty. For this purpose, we extend possibilistic classifiers, which have been recently adapted to numerical data, in order to cope with uncertainty in data representation. Here the possibility distributions that are used are supposed to encode the family of Gaussian probabilistic distributions that are compatible with the considered data set. We consider two types of uncertainty: i) the uncertainty associated with the class in the training set, which is modeled by a possibility distribution over class labels, and ii) the imprecision pervading attribute values in the testing set represented under the form of intervals for continuous data. Moreover, the approach takes into account the uncertainty about the estimation of the Gaussian distribution parameters due to the limited amount of data available. We first adapt the possibilistic classificatio
    corecore